HapZipper: sharing HapMap populations just got easier
نویسندگان
چکیده
The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HapZipper, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma. We demonstrate the usefulness of HapZipper by compressing HapMap 3 populations to <5% of their original sizes. HapZipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.
منابع مشابه
Npgrj_ng_1911 1251..1260
Recent genomic surveys have produced high-resolution haplotype information, but only in a small number of human populations. We report haplotype structure across 12 Mb of DNA sequence in 927 individuals representing 52 populations. The geographic distribution of haplotypes reflects human history, with a loss of haplotype diversity as distance increases from Africa. Although the extent of linkag...
متن کاملA Genetic Population Isolate in The Netherlands Showing Extensive Haplotype Sharing and Long Regions of Homozygosity
Genetic isolated populations have features that may facilitate genetic analyses and can be leveraged to improve power of mapping genes to complex traits. Our aim was to test the extent to which a population with a former history of geographic isolation and religious endogamy, and currently with one of the highest fertility rates in The Netherlands, shows signs of genetic isolation. For this pur...
متن کاملNavigating the HapMap
With the availability of the HapMap--a resource which describes common patterns of linkage disequilibrium (LD) in four different human population samples, we now have a powerful tool to help dissect the role of genetic variation in the biology of the genome. HapMap is entirely complimentary to the human genome map and so it is particularly fitting that it should be viewed in a full genomic cont...
متن کاملUnexpected Relationships and Inbreeding in HapMap Phase III Populations
Correct annotation of the genetic relationships between samples is essential for population genomic studies, which could be biased by errors or omissions. To this end, we used identity-by-state (IBS) and identity-by-descent (IBD) methods to assess genetic relatedness of individuals within HapMap phase III data. We analyzed data from 1,397 individuals across 11 ethnic populations. Our results su...
متن کاملMeiosis: Making a Synaptonemal Complex Just Got Easier
In preparation for meiosis, chromosomes go through several massive structural transitions, including chromosome fragmentation, pairing and synapsis. A checkpoint factor and a SUMO ligase collaborate to keep things in order.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 40 شماره
صفحات -
تاریخ انتشار 2012